Decoding of watermarked infornation signals

ABSTRACT

Method and apparatus are described for compensating for time offset in a received signal, so as to correctly align the frame sequence of a received signal to a sequence of transmitted symbols. Each symbol extends over T s  signal samples. The received signal is first divided into a sequence of frames of length T s , and then each framedivided into a multiplicity of N b  sub-frames. Subsequently, N b  sequences of values are formed, where every successive value in each sequence is derived from the corresponding sub-frame within each successive frame. Each of the N b  sequences is an estimate for the correctly aligned sequence of transmitted symbols.

The present invention relates to apparatus and methods for decodinginformation that has been embedded in information signals, such asaudio, video or data signals.

Watermarking of information signals is a technique for the transmissionof additional data along with the information signal. For instance,watermarking techniques can be used to embed copyright and copy controlinformation into audio signals.

The main requirement of a watermarking scheme is that it is notobservable (i.e. in the case of an audio signal, it is inaudible) whilstbeing robust to attacks to remove the watermark from the signal (e.g.removing the watermark will damage the signal). It will be appreciatedthat the robustness of a watermark will normally be a trade off againstthe quality of the signal in which the watermark is embedded. Forinstance, if a watermark is strongly embedded into an audio signal (andis thus difficult to remove) then it is likely that the quality of theaudio signal will be reduced.

Various types of audio watermarking schemes have been proposed, eachwith its own advantages and disadvantages. For instance, one type ofaudio watermarking scheme is to use temporal correlation techniques toembed the desired data (e.g. copyright information) into the audiosignal. This technique is effectively an echo-hiding algorithm, in whichthe strength of echo is determined by solving a quadratic equation. Thequadratic equation is generated by auto-correlation values at twopositions: one at delay equal to τ, and one at delay equal to 0. At thedetector, the watermark is extracted by determining the ratio of theauto correlation function at the two delay positions.

WO 00/00969 in the name of Aris Technologies describes a technique forembedding or encoding auxiliary signals into an information host orcover signal. A replica of the cover signal, or a portion of the coversignal in a particular domain (time, frequency or space), is generatedaccording to a stego key, which specifies modification values to theparameters of the cover signal. The replica signal is then modified byan auxiliary signal corresponding to the information to be embedded, andinserted back into the cover signal so as to form the stego signal.

At the decoder, in order to extract the original auxiliary data, areplica of the stego signal is generated in the same manner as thereplica of the original cover signal, and requires the use of the samestego key. The resulting replica is then correlated with the receivedstego signal, so as to extract the auxiliary signal. The extraction ofthe auxiliary signal is relatively complex, and requires the stego keyat both the encoder (or embedder) and decoder (or detector).Additionally, a brute force search is required to synchronize to theauxiliary signal at the detector.

Further, performance of the payload extraction is dependent on how wellthe auxiliary signal can be estimated. In a system with a high expectederror rate of the payload bits in the auxiliary signal, this is verydifficult to achieve. Solutions would lead to very complex errorcorrection methods, or significantly limit the information capacity.

It is an object of the present invention to provide a compensation fortime offset for a watermark decoding scheme that substantially addressesat least one of the problems of the prior art.

In a first aspect, the present invention provides a method ofcompensating for offset in a received signal, the signal being modifiedby a sequence of symbols, each symbol extending over T_(s) signalsamples, the method comprising the steps of: (a) dividing the receivedsignal into frames of predetermined length T_(s);(b) dividing each frameinto a plurality of N_(b) sub-frames;(c) forming N_(b) sequences ofvalues, the values being derived from the corresponding sub-frame withineach frame; and (d) taking said N_(b) sequences as successive estimatesof a frame sequence correctly aligned (with no offset) to the sequenceof symbols.

Preferably, each frame overlaps an adjacent frame.

Preferably, each sub-frame overlaps an adjacent sub-frame.

Preferably, N_(b) lies within the range 2 to 8.

Preferably, the sequence of symbols comprises L_(w) symbols, thereceived signal being divided into L_(F) frames, wherein L_(F) is anintegral multiple of T_(s)·L_(w).

Preferably, each symbol of the said sequence extends over T_(s) samples.

Preferably, said symbols are shaped with a window shaping function thathas a band limited frequency behavior and, within the pass band, has asmooth (graceful) temporal behavior. Further, the window shapingfunction preferably has a symmetric or anti-symmetric temporal behavior.

Preferably, said window shaping function is one of raised cosinefunctions or bi-phase functions.

Preferably, said offset is a time offset between the received and thetransmitted signals.

Preferably, the method further comprises processing each estimategenerated in step (d) as though it was the correctly aligned framesequence, so as to determine which estimate is the best estimate.

The method preferably further comprises the step of correlating each ofsaid estimates with a reference sequence corresponding to said sequenceof symbols; and taking the estimate with the maximum correlation peakvalue as the best estimate.

Preferably, the best estimate is assumed to be the first estimate that,when processed, exceeds one or more predetermined conditions.Preferably, the processing of estimates stops once a working estimatehas been determined.

Preferably, once a working estimate has been determined for a firstsignal or portion of a signal, the method is repeated for a furtherreceived signal or portion of a signal, the estimates from said furthersignal being processed in an order dependent upon said first bestestimate. Thus the method can adaptively correct for offset.

In a further aspect, the present invention provides a computer programarranged to perform the above method.

In another aspect, the present invention provides a record carriercomprising the above computer program.

In a further aspect, the present invention provides a method of makingavailable for downloading the above computer program.

In another aspect, the present invention provides an apparatus arrangedto compensate for offset in a received signal, the signal being modifiedby a sequence of symbols, each symbol extending over T_(s) signalsamples, the apparatus comprising: a divider arranged to divide thereceived signal into frames of preferable length T_(s); a dividerarranged to divide each frame into a plurality of N_(b) sub-frames; anda processor arranged to form N_(b) sequences of values, the values beingderived from the corresponding sub-frame within each frame; and to takesaid N_(b) sequences as successive estimates of a frame sequencecorrectly aligned (with no offset) to the sequence of symbols.

For a better understanding of the invention, and to show how embodimentsof the same may be carried into effect, reference will now be made, byway of example, to the accompanying diagrammatic drawings in which:

FIG. 1 is a diagram illustrating a watermark embedding apparatus;

FIG. 2 shows a signal portion extraction filter H;

FIGS. 3 a and 3 b show respectively the typical amplitude and phaseresponses as a function of frequency of the filter H shown in FIG. 2;

FIG. 4 shows the payload embedding and watermark conditioning stage ofthe apparatus shown in FIG. 1;

FIG. 5 is a diagram illustrating the details of the watermarkconditioning apparatus H_(c) of FIG. 4, including charts of theassociated signals at each stage;

FIGS. 6 a and 6 b show two preferred alternative window shapingfunctions s(n) in the form of respectively a raised cosine function anda bi-phase function;

FIGS. 7 a and 7 b show respectively the frequency spectra for awatermark sequence conditioned with a raised cosine and a bi-phaseshaping window function;

FIG. 8 is a diagram illustrating a watermark detector in accordance withan embodiment of the present invention;

FIG. 9 diagrammatically shows the whitening filter H_(w) of FIG. 8, foruse in conjunction with a raised cosine shaping window function;

FIG. 10 diagrammatically shows the whitening filter H_(w) of FIG. 8, foruse in conjunction with a bi-phase window shaping function;

FIG. 11 shows details of the watermark symbol extraction and bufferingprocesses in accordance with an embodiment of the present invention;

FIG. 12 shows a typical shape of the correlation function output fromthe correlator of the watermark detector shown in FIG. 8; and

FIG. 13 shows an example of one preferred implementation of the symbolextraction and buffering stage.

FIG. 1 shows a block diagram of the apparatus required to perform thedigital signal processing for embedding a multi-bit payload watermark winto a host signal x.

A host signal x is provided at an input 12 of the apparatus. The hostsignal x is passed in the direction of output 14 via the adder 22.However, a replica of the host signal x (input 8) is split off in thedirection of the multiplier 18, for carrying the watermark information.

The watermark signal w_(c) is obtained from the payload embedder andwatermark conditioning apparatus 6, and derived from a reference finitelength random sequence w_(s) input to the payload embedder and watermarkconditioning apparatus. The multiplier 18 is utilized to calculate theproduct of the watermark signal w_(c) and the replica audio signal x.The resulting product, w_(c)x is then passed via a gain controller 24 tothe adder 22. The gain controller 24 is used to amplify or attenuate thesignal by a gain factor α.

The gain factor α controls the trade off between the audibility and therobustness of the watermark. It may be a constant, or variable in atleast one of time, frequency and space. The apparatus in FIG. 1 showsthat, when α is variable, it can be automatically adapted via a signalanalyzing unit 26 based upon the properties of the host signal x.Preferably, the gain a is automatically adapted, so as to minimize theimpact on the signal quality, according to a properly chosenperceptibility cost-function, such as a psycho-acoustic model of thehuman auditory system (HAS) in case of an audio signal. Such a model is,for instance, described in the paper by E. Zwicker, “Audio Engineeringand Psychoacoustics: Matching signals to the final receiver, the HumanAuditory System”, Journal of the Audio Engineering Society, Vol. 39, pp.Vol.115-126, March 1991.

In the following, an audio watermark is utilized, by way of exampleonly, to describe this embodiment of the present invention.

The resulting watermark audio signal y is then obtained at the output 14of the embedding apparatus 10 by adding an appropriately scaled versionof the product of w_(c) and x to the host signal:y[n]=x[n]+αw _(c) [n]x[n]  (1)

Preferably, the watermark w_(c) is chosen such that when multiplied withx, it predominantly modifies the short time envelope of x.

FIG. 2 shows one preferred embodiment in which the input 8 to themultiplier 18 in FIG. 1 is obtained by filtering a replica of the hostsignal x using a filter H in the filtering unit 15. If the filter outputis denoted by x_(b), then according to this preferred embodiment, thewatermark signal is generated by adding the product of x_(b) and thewatermark w_(c) to the host signal x:y[n]=x+αw _(c) [n]x _(b) [n].   (2)

Let {overscore (x)}_(b) be defined such that {overscore(x)}_(b)=x−x_(b), and y_(b) be defined such that y=y_(b)+{overscore(x)}_(b), then the envelope modulated portion y_(b) of the watermarkedsignal y is given asy _(b) [n]=(l+w _(c) [n])x _(b) [n]  (3)

Preferably, as shown in FIG. 3, the filter H is a linear phase band passfilter characterized by its lower cut-off frequency f_(L) and uppercut-off frequency f_(H). As can be seen in FIG. 3(b), the filter H has alinear phase response with respect to frequency f within the pass-band(BW). Thus, when H is a band pass filter, x_(b) and {overscore (x)}_(b)are the in-band and out-of-band components of the host signalrespectively. For optimum performance, it is preferable that the signalsx_(b) and {overscore (x)}_(b) are in phase. This is achieved byappropriately compensating for the phase distortion produced by filterH. In the case of a linear phase filter, the distortion is a simple timedelay.

In FIG. 4, the details of the payload embedder and watermarkconditioning unit 6 is shown. In this unit, the initial reference randomsequence w, is converted into a multi-bit watermark signal w_(c).

Firstly a finite length, preferably zero mean and uniformly distributedrandom sequence w_(s), from now on also referred to as the watermarkseed signal, is generated using a random number generator with aninitial seed S. As will be appreciated later, it is preferable that thisinitial seed S is known to both the embedder and the detector, such thata copy of the watermark signal can be generated at the detector forcomparison purposes. This results in the sequence of length L_(w).w _(s) [k]∈[−1,1], for k=0, 1, 2, . . . , L _(w)−1   (4)

It should be noted that in some applications, the seed can betransmitted to the detector via an alternate channel or can be derivedfrom the received signal using some pre-determined protocol.

Then the sequence w_(s) is circularly shifted by the amounts d₁ and d₂using the circularly shifting unit 30 to obtain the random sequencesw_(d1) and w_(d2) respectively. It will be appreciated that these twosequences (w_(d1) and w_(d2)) are effectively a first sequence and asecond sequence, with the second sequence being circularly shifted withrespect to the first. Each sequence w_(di), i=1,2, is subsequentlymultiplied with a respective sign bit r_(i), in the multiplying unit 40,where r_(i)=+1 or −1. The respective values of r₁ and r₂ remainconstant, and only change when the payload of the watermark is changed.Each sequence is then converted into a periodic, slowly varyingnarrow-band signal w_(i) of length L_(w)T_(s) by the watermarkconditioning circuit 20 shown in FIG. 4. Finally, the slowly varyingnarrow-band signals w₁ and w₂ are added with a relative delay T_(r)(where T_(r)<T_(s)) to give the multi-bit payload watermark signalw_(c). This is achieved by first delaying the signal w₂ by the amountT_(r) using delaying unit 45 and subsequently by adding it to w₁ withthe adding unit 50.

FIG. 5 shows the watermark conditioning apparatus 20 used in the payloadembedder and watermark conditioning apparatus 6 in more detail. Thewatermark seed signal w_(s) is input to the conditioning apparatus 20.

For convenience, the modification of only one of the sequences w_(di) isshown in FIG. 5, but it will be appreciated that each of the sequencesis modified in a similar manner, with the results being added to obtainthe watermark signal w_(c).

As shown in FIG. 5, each watermark signal sequence w_(di)[k], i=1,2 isapplied to the input of a sample repeater 180. Chart 181 illustrates oneof the sequences w_(di) as a sequence of values of random numbersbetween +1 and −1, with the sequence being of length L_(w). The samplerepeater repeats each value within the watermark seed signal sequenceT_(s) times, so as to generate a rectangular pulse train signal. T_(s)is referred to as the watermark symbol period and represents the span ofthe watermark symbol in the audio signal. Chart 183 shows the results ofthe signal illustrated in chart 181 once it has passed through thesample repeater 180.

A window shaping function s[n], such as a raised cosine window, is thenapplied to convert the rectangular pulse functions derived from w_(d1)and w_(d2) into slowly varying watermark sequence functions w₁[n] andw₂[n] respectively.

Chart 184 shows a typical raised cosine window shaping function, whichis also of span T_(s).

The generated watermark sequences w₁[n] and w₂[n] are then added up witha relative delay T_(s) (where T_(r)<T_(s)) to give the multi-bit payloadwatermark signal w_(c)[n] i.e.,w _(c) [n]=w ₁ [n]+w ₂ [n−T _(r])  (5)

The value of T_(r) is chosen such that the zero crossings of w₁ matchthe maximum amplitude points of w₂ and vice-versa. Thus, for a raisedcosine window shaping function T_(r)=T_(s)/2, and for a bi-phase windowshaping function T_(r)=T_(s)/4. For other window shaping functions,other values of T_(r) are possible.

As will be appreciated by the below description, during detection thecorrelation of w_(c)[n] will generate two correlation peaks that areseparated by pL′ (as can be seen in FIG. 12). pL′ is an estimate of thecircular shift pL between w_(d1) and w_(d2), which is part of thepayload, and is defined aspL=|d ₂ −d ₁|mod ([L _(w)/2])   (6)

In addition to pL, extra information can be encoded by changing therelative signs of the embedded watermarks.

In the detector, this is seen as a relative sign r_(sign) between thecorrelation peaks. It may be defined as: $\begin{matrix}{r_{sign} = {\frac{{2 \cdot \rho_{1}} + \rho_{2} + 3}{2} \in \left\{ {0,1,2,3} \right\}}} & (7)\end{matrix}$where ρ₁=sign(cL₁) and ρ₂=sign(cL₂) are respectively estimates of thesign bits r₁ (input 80) and r₂ (input 90) of FIG. 4, and cL₁ and cL₂ arethe values of the correlation peak corresponding to w_(d1) and w_(d2)respectively. The overall watermark payload pL_(w), for an error-freedetection, is then given as a combination of r_(sign) and pL:pL _(w) =<r _(sign) , pL>.   (8)

The maximum information (I_(max)) in number of bits, that can be carriedby a watermark sequence of length L_(w) is thus given by:I _(max)=log₂(4·[L _(w)/2]) bits   (9)

In such a scheme, the payload is immune to relative offset between theembedder and the detector, and also to possible time scalemodifications.

The window shaping function has been identified as one of the mainparameters that controls the robustness and audibility behavior of thepresent watermarking scheme. As illustrated in FIGS. 6 a and b, twoexamples of possible window shaping functions are herein described—araised cosine function and a bi-phase function.

It is preferable to use a bi-phase window function instead of a raisedcosine window function, so as to obtain a quasi DC-free watermarksignal. This is illustrated in FIGS. 7 a and 7 b, showing the frequencyspectra corresponding to a watermark sequence (in this case a sequenceof w_(di)[k]={1,1,−1,1,−1,−1,}) conditioned with respectively a raisedcosine and a bi-phase window shaping function. As can be seen, thefrequency spectrum for the raised cosine conditioned watermark sequencehas a maximum at frequency f=0, whilst the frequency spectrum for thebi-phase shaped watermark sequence has a minimum at f=0 i.e. it has verylittle DC component.

Useful information is only contained in the non-DC component of thewatermark. Consequently, for the same added watermark energy, awatermark conditioned with the bi-phase window will carry more usefulinformation than one conditioned by the raised cosine window. As aresult, the bi-phase window offers superior audibility performance forthe same robustness or, conversely, it allows a better robustness forthe same audibility quality.

Such a bi-phase function could be utilized as a window shaping functionfor other watermarking schemes. In other words, a bi-phase functioncould be applied to reduce the DC component of signals (such as awatermark) that are to be incorporated into another signal.

FIG. 8 shows a block diagram of a watermark detector (200, 300, 400).The detector consists of three major stages: (a) the watermark symbolextraction stage (200), (b) the buffering and interpolation stage (300),and (c) the correlation and decision stage (400).

In the symbol extraction stage (200), the received watermarked signaly′[n] is processed to generate multiple (N_(b)) estimates of thewatermarked sequence. These estimates of the watermark sequence arerequired to resolve time offset that may exist between the embedder andthe detector, so that the watermark detector can synchronize to thewatermark sequence inserted in the host signal.

In the buffering and interpolation stage (300), these estimates arede-multiplexed into N_(b) separate buffers, and an interpolation isapplied to each buffer to resolve time scale modifications that may haveoccurred, e.g. a drift in sampling (clock) frequency may have resultedin a stretch or shrink in the time domain signal (i.e. the watermark mayhave been stretched or shrunk).

In the correlation and decision stage (400), the content of each bufferis correlated with the reference watermark and the maximum correlationpeaks are compared against a threshold to determine the likelihood ofwhether the watermark is indeed embedded within the received signaly′[n].

In order to maximize the accuracy of the watermark detection, thewatermark detection process is typically carried out over a length ofreceived signal y′[n] that is 3 to 4 times that of the watermarksequence length. Thus each watermark symbol to be detected can beconstructed by taking the average of several estimates of said symbol.This averaging process is referred to as smoothing, and the number oftimes the averaging is done is referred to as the smoothing factors_(f). Let L_(D) be the detection window length, defined as the lengthof the audio segment (in number of samples) over which a watermarkdetection truth-value is reported. Then, L_(D)=s_(f)L_(w)T_(s), whereT_(s) is the symbol period and L_(w) the number of symbols within thewatermark sequence. During symbol extraction, a factor T_(s) decimationtakes place in the energy computation stage. Thus, the length (L_(b)) ofeach buffer 320 within the buffering and interpolation stage isL_(b)=s_(f)L_(w).

In the watermark symbol extraction stage 200 shown in FIG. 8, theincoming watermark signal y′[n] is input to the optional signalconditioning filter H_(b)(210). This filter 210 is typically a band passfilter and has the same behavior as the corresponding filter (H, 15)shown in FIG. 2. The output of the filter H_(b) is y′_(b)[n] and,assuming linearity within the transmission medium, it follows fromequations (1) and (3):y′ _(b) [n]≅y _(b) [n]=(1+αw[n])x _(b[n])  (10)

Note that in the above expression, the possible time offset between theembedder and the detector is implicitly ignored. For ease of explanationof the general watermarking scheme principles, from now on, it isassumed that there is perfect synchronism between the embedder and thedetector (i.e. no offset). Explanation is given however below inreference to FIG. 11 of how to compensate for time offset in accordancewith the present invention.

Note that when no filter is used in the embedder (i.e., when H=1) thenH_(b) in the detector can also be omitted, or it can still be includedto improve the detection performance. If H_(b) is omitted, then y_(b) inequation (10) is replaced with y. The rest of the processing is thesame.

We assume that the audio signal is divided into frames of length T_(s),and that y′_(b,m)[n] is the n-th sample of the m-th filtered framesignal. The energy E[m] corresponding to the m-th frame is:$\begin{matrix}{{E\lbrack m\rbrack} = {\sum\limits_{n = 0}^{T_{s} - 1}\quad{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}}} & (11)\end{matrix}$

Combining this with equation 10, it follows that: $\begin{matrix}{{{E\lbrack m\rbrack} \approx {\sum\limits_{n = 0}^{T_{s} - 1}\quad{{y_{b,m}\lbrack n\rbrack}}^{2}}} = {\sum\limits_{n = 0}^{T_{s} - 1}\quad{{\left( {1 + {\alpha\quad{w_{e}\lbrack m\rbrack}}} \right){x_{b,m}\lbrack n\rbrack}}}^{2}}} & (12)\end{matrix}$where w_(e)[m] is the m-th extracted watermark symbol and contains N_(b)time-multiplexed estimates of the embedded watermark sequences: Solvingfor w_(e)[m] in equation 12 and ignoring higher order terms of α, givesthe following approximation: $\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\quad\alpha}\left( {\frac{\sum\limits_{n = 0}^{T_{s} - 1}\quad{{y_{b,m}\lbrack n\rbrack}}^{2}}{\sum\limits_{n = 0}^{T_{s} - 1}\quad{{x_{b,m}\lbrack n\rbrack}}^{2}} - 1} \right)}} & (13)\end{matrix}$

In the watermark extraction stage 200 shown in FIG. 8, the outputy′_(b)[n] of the filter H_(b) is provided as an input to a frame divider220, which divides the audio signal into frames of length T_(s) i.e.into y′_(b,m)[n], with the energy calculating unit 230 then being usedto calculate the energy corresponding to each of the framed signals asper equation (12). The output of this energy calculation unit 230 isthen provided as an input to the whitening stage H_(w) (240) whichperforms the function shown in equation 13 so as to provide an outputw_(e)[m]. Alternative implementations (240A, 240B) of this whiteningstage are illustrated in FIGS. 9 and 10.

It will be realized that the denominator of equation 13 contains a termthat requires knowledge of the host (original) signal x. As the signal xis not available to the detector, it means that in order to calculatew_(e)[m] then the denominator of equation 13 must be estimated.

Below is described how such an estimation can be achieved for the twodescribed window shaping functions (the raised cosine window shapingfunction and the bi-phase window shaping), but it will equally beappreciated that the teaching could be extended to other window shapingfunctions.

In relation to the raised cosine window shaping function shown in FIG. 6a, it has been realized that the audio envelope induced by the watermarkcontributes only to the noisy part of the energy function E[m]. Theslowly varying part (i.e. the low frequency components) is predominatelydue to the contribution of the envelope of the original audio signal x.Thus, equation 13 may be approximated by: $\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\quad\alpha}\left( {\frac{E\lbrack m\rbrack}{{lowpass}\quad\left( {E\lbrack m\rbrack} \right)} - 1} \right)}} & (14)\end{matrix}$where “lowpass(.)” is a low pass filter function. Thus, it will beappreciated that the whitening filter H_(w) for the raised cosine windowshape in the function can be realized as shown in FIG. 9.

As can be seen, such a whitening filter H_(w) (240A) comprises an input242A for receiving the signal E[m]. A portion of this signal is thenpassed through the low pass filter 247A to produce a low pass filteredenergy signal E_(LP)[m], which in turn is provided as an input to thecalculation stage 248A along with the function E[m]. The calculationstage 248A then divides E[m] by E_(LP)[m] to calculate the extractedwatermark symbol w_(e)[m].

When a bi-phase window function is employed in the watermarkconditioning stage of the embedder, a different approach should beutilized to estimate the envelope of the original audio, and hence tocalculate w_(e)[m].

It will be seen by examination of the bi-phase window function shown inFIG. 6 b, that when the audio envelope is modulated with such a windowfunction, the first and the second halves of the frame are scaled inopposite directions. In the detector, this property is utilized toestimate the envelope energy of the host signal x.

Consequently, within the detector, each audio frame is first sub-dividedinto two halves. The energy functions corresponding to the first andsecond half-frames are hence given by $\begin{matrix}{{E_{1}\lbrack m\rbrack} = {\sum\limits_{n = 0}^{{T_{s}/2} - 1}\quad{{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}\quad{and}}}} & (15) \\{{E_{2}\lbrack m\rbrack} = {\sum\limits_{n = {T_{s}/2}}^{T_{s} - 1}\quad{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}}} & (16)\end{matrix}$respectively. As the envelope of the original audio is modulated inopposite directions within the two sub-frames, the original audioenvelope can be approximated as the mean of E₁[m] and E₂[m].

Further, the instantaneous modulation value can be taken as thedifference between these two functions. Thus, for the bi-phase windowfunction, the watermark w_(e)[m] can be approximated by: $\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\quad\alpha}\left( {\frac{{E_{1}\lbrack m\rbrack} - {E_{2}\lbrack m\rbrack}}{{E_{1}\lbrack m\rbrack} + {E_{2}\lbrack m\rbrack}} - 1} \right)}} & (17)\end{matrix}$

Consequently, the whitening filter H_(w) (240B) in FIG. 8 for a bi-phasewindow shaping function can be realized as shown in FIG. 10. Inputs 242Band 243B respectively receive the energy functions of the first andsecond half frames E₁[m] and E₂[m]. Each energy function is then splitup into two, and provided to adders 245B and 246B which respectivelycalculate E₁[m]−E₂[m], and E₁[m]+E₂[m]. Both of these calculatedfunctions are then passed to the calculating unit 24B which divides thevalue from adder 245B by the value from 246B so as to calculatew_(c)[m], containing N_(b) time-multiplexed estimates of the embeddedwatermark sequences, in accordance with equation 17.

This output w_(e)[m] is then passed to the buffering and interpolationstage 300, where the signal is de-multiplexed by a de-multiplexer 310,buffered in buffers 320 of length L_(b), so as to resolve a lack ofsynchronism between the embedder and the detector, and interpolatedwithin the interpolation unit 330 so as to compensate for a time scalemodification between the embedder and the detector.

In order to maximize the possible robustness of a watermark, it isimportant to make sure that the watermarking system is immune to timeoffsets between the embedder and the detector. In other words, thewatermark detector must be able to synchronize to the watermark sequenceinserted in the host signal.

FIG. 11 illustrates the process carried out by the buffering andinterpolation stage 300 to resolve the offset issue. The exampledescribed illustrates the process for resolving offset when a raisedcosine window shaping function has been employed in the watermarkembedding process. However, in principle the same technique isapplicable when the bi-phase window shaping function has been used.

Referring to FIG. 11, after filtering by the filter H_(b) 210, theincoming audio signal stream y′b[n] is separated into preferablyoverlapping frames 302 of effective length T_(s) by the frame divider220.

Preferably, to resolve possible offset between the embedder and thedetector, each frame is divided into N_(b) sub-frames (304 a, 304 b, . .. , 304 x), and the above computations (equations (12) to (17)) areapplied on a sub-frame basis.

Preferably, each sub-frame overlaps with an adjacent sub-frame. In theexample shown, it can be seen that there is a 50% overlap (T_(s)/N_(b))of each sub-frame (304 a, 304 b, . . . , 304 x), with each of thesub-frames being of length 2T_(s)/N_(b). When overlapping sub-frames areconsidered, the main frames are preferably longer than the symbol periodT_(s) so as to allow inter-frame overlap as shown in FIG. 11.

The energy of the audio is then computed for each sub-frame by thewhitening stage 240, and the resulting values are de-multiplexed intothe N_(b) buffers 320 by the de-multiplexer 310. Each successive one(B₁,B₂, . . . , B_(Nb)) of the buffers 320 will thus contain a sequence ofvalues, with the first buffer B₁ containing a sequence of valuescorresponding to the first sub-frame within each frame, the secondbuffer B₂ containing a sequence of values corresponding to the secondsub-frame within each frame etc.

If w_(Di) is the content of the i-th buffer, then it can be shown that:w _(Di) [k]=w _(e) [k·N _(b) +i], k ∈{0, . . . , L _(b)−1}  (18)where L_(b) is the buffer length.

For a raised cosine window shaping function, the energy of the embeddedwatermark is concentrated near the center of the frame, such that thesub-frame best aligned with the center of the frame will result in adistinctly better estimate of the embedded watermark symbol than all theother sub-frames. Effectively, each buffer thus contains an estimate ofthe symbol sequence, the estimates corresponding to the sequences havingdifferent time offsets.

The sub-frame best aligned with the center of the frame (i.e. the bestestimate of the correctly aligned frame) is determined by correlatingthe contents of each buffer with the reference watermark sequence. Thesequence with the maximum correlation peak value is chosen as the bestestimate of the correctly aligned frame. The corresponding confidencelevel, as described below, is used to determine the truth-value of thedetection. Preferably, the correlation process is halted once anestimated watermark sequence with a correlation peak above the definedthreshold has been found.

Typically, the length of each buffer is between 3 to 4 times thewatermark sequence length L_(w), and is thus typically of length between2048 and 8192 symbols, and N_(b) is typically within the range of 2 to8.

The buffer is normally 3 to 4 times that of the watermark sequence sothat each watermark symbol can be constructed by taking the averages ofseveral estimates of said symbol. This averaging process is referred toas smoothing, and the number of times the averaging is done is referredto as the smoothing factor s_(f). Thus, given the buffer length L_(b)and the watermark sequence length L_(w), the smoothing factor s_(f) issuch that:L_(b)=s_(f)L_(w)   (19)

In another preferred embodiment, the detector refines the parametersused in the offset search based upon the results of a previous searchstep. For instance, if a first series of estimates shows that theresults stored in buffer B₃ provide the best estimate of the informationsignal, then the next offset search (either on the same received signal,or on the signal received during the next detection window) is refinedby shifting the position of the sub-frames towards the position of thebest estimate sub-frame. The estimates of the sequence having zerooffset can thus be iteratively improved.

As shown in FIG. 8, outputs (w_(D1), w_(D2), . . . W_(DNb)) from thebuffering stage are passed to the interpolation stage and, afterinterpolation, the outputs (w_(I1), w_(I2), . . . w_(INb)) of thisstage, which are needed to resolve a possible time scale modification inthe watermarked signal, are passed to the correlation and decisionstage. All of the estimates (w_(I1), w_(I2), . . . w_(INb)) of thewatermark corresponding to the different possible offset values arepassed to the correlation and decision stage 400.

The correlator 410 calculates the correlation of each estimate w_(Ij),j=1, . . . , N_(b) with respect to the reference watermark sequencew_(c)[k]. Each respective correlation output corresponding to eachestimate is then applied to the maximum detection unit 420 whichdetermines which two estimates provided the maximum correlation peakvalues. These estimates are chosen as the ones that best fit thecircularly shifted versions w_(d1) and w_(d2) of the referencewatermark. The correlation values for these estimated sequences arepassed to the threshold detector and payload extractor unit 430.

The reference watermark sequence w_(s) used within the detectorcorresponds to (a possibly circularly shifted version of) the originalwatermark sequence applied to the host signal. For instance, if thewatermark signal was calculated using a random number generator withseed S within the embedder, then equally the detector can calculate thesame random number sequence using the same random number generationalgorithm and the same initial seed S so as to determine the watermarksignal. Alternatively, the watermark signal originally applied in theembedder and utilized by the detector as a reference could simply be anypredetermined sequence.

FIG. 12 shows a typical shape of a correlation function as output fromthe correlator 410. The horizontal scale shows the correlation delay (interms of the sequence samples). The vertical scale on the left hand side(referred to as the confidence level cL) represents the value of thecorrelation peak-normalized with respect to the standard deviation ofthe normally distributed correlation function.

As can be seen, the typical correlation is relatively flat with respectto cL, and centered about cL=0. However, the function contains twopeaks, which are separated by pL (see equation 6) and extend upwards tocL values that are above the detection threshold when a watermark ispresent. When the correlation peaks are negative, the above statementapplies to their absolute values.

A horizontal line (shown in the Fig. as being set at cL=8.7) representsthe detection threshold. The detection threshold value controls thefalse alarm rate.

Two kinds of false alarms exist: The false positive rate, defined as theprobability of detecting a watermark in non watermarked items, and thefalse negative rate, which is defined as the probability of notdetecting a watermark in watermarked items. Generally, the requirementof the false positive alarm is more stringent than that of the falsenegative. The scale on the right hand side of FIG. 11 illustrates theprobability of a false positive alarm b. As can be seen in the exampleshown, the probability of a false positive b=10⁻¹² is equivalent to thethreshold cL=8.7, whilst b=1083 is equivalent to cL=20.

After each detection interval, the detector determines whether theoriginal watermark is present or whether it is not present, and on thisbasis outputs a “yes” or a “no” decision. If desired, to improve thisdecision making process, a number of detection windows may beconsidered. In such an instance, the false positive probability is acombination of the individual probabilities for each detection windowconsidered, dependent upon the desired criteria. For instance, it couldbe determined that if the correlation function has two peaks above athreshold of cL=7 on any two out of three detection intervals, then thewatermark is deemed to be present. Such detection criteria can bealtered depending upon the desired use of the watermark signal and totake into account factors such as the original quality of the hostsignal and how badly the signal is likely to be corrupted during normaltransmission.

The payload extractor unit 430 may subsequently be utilized to extractthe payload (e.g. information content) from the detected watermarksignal. Once the unit has estimated the two correlation peaks cL₁ andcL₂ that exceed the detection threshold, an estimate cL′ of the circularshift cL (defined in equation (6)) is derived as the distance betweenthe peaks. Next, the signs ρ₁ and ρ₂ of the correlation peaks aredetermined, and hence r_(sign) calculated from equation (7). The overallwatermark payload may then be calculated using equation (8).

For instance, it can be seen in FIG. 12 that pL is the relative distancebetween the two peaks. Both peaks are positiye i.e. ρ₁=+1, and ρ₂=+1.From equation (7), r_(sign)=3. Consequently, the payload pL_(w)=<3, pL>.

The symbol extraction and buffering stages described in FIGS. 8 can beefficiently implemented by the apparatus 500 shown in FIG. 13. Here, theoffset compensation is achieved without any extra computation. It willalso be seen that the de-multiplexing is achieved with a simple set ofdelays and decimation blocks.

First, the incoming frame signal y_(b,m) is subdivided into N_(b)non-overlapping sub-frames of length T_(s)/N_(b) and the energy of eachsub-frame is computed using the energy computation unit 230. Secondly,the whitening filter H_(w) is applied in the whitening unit 240. Thecombination of the delay unit 510 and the adding unit 520 effectivelyrealizes a 50% overlap between adjacent sub-frames. After the watermarksymbol sequence w_(e)[m] is generated at the output of the adder unit520, it is subsequently distributed over the N_(b) buffer unit 320 usingthe combination of the delay set 512 and the down sampling set 530. Thisis done such that each buffer gets one value for every N_(b) in comingsamples of w_(e)[k]. For instance, if the first sample goes to w_(D1),the second sample goes to w_(D2), third one to w_(D3), the N_(b)-th oneto w_(DNb) and then the (N_(b)+1)-th one goes back to w_(D1) and so onuntil all the buffers are filled up. Thus the i-th buffer entryw_(Di)[k] may be expressed asw _(Di) [k]=w _(e) [N _(b) k+i],   (20)

It is seen that the sampling frequency of w_(Di)[k] is 1/N_(b) timesthat of w_(e)[m]. This decimation is achieved via the decimating set 532in FIG. 13.

Since non-overlapping frames are considered in the energy computing unit230, the total computation needed to generate the N_(b) sequences is thesame as that would have been required if only one sequence with symbolsextending over the whole frame was computed.

It will be appreciated by the skilled person that variousimplementations not specifically described would be understood asfalling within the scope of the present invention. For instance, whilstonly the functionality of the detecting apparatus has been described, itwill be appreciated that the apparatus could be realized as a digitalcircuit, an analog circuit, a computer program, or a combinationthereof.

Equally, whilst the above embodiment has been described with referenceto an audio signal, it will be appreciated that the present inventioncan be applied to add information to other types of signal, for instanceinformation or multimedia signals, such as video and data signals.

Further, it will be appreciated that the invention can be applied towatermarking schemes containing only one watermarking sequence (i.e. a1-bit scheme), or to watermarking schemes containing multiplewatermarking sequences. Such multiple sequences can be simultaneously orsuccessively embedded within the host signal.

Equally, whilst the above detection of the watermark has been describedwith each estimate being correlated, it will be appreciated that thecorrelation procedure can be arranged to stop once a positive detectionof the watermark has been made. This reduces the offset determinationtime. Further, the decoder can be arranged to adaptively compensate fortime offset, by re-ordering the buffers (or the order in which thebuffers are correlated) such that the best aligned buffer in the currentdetection window will be the first buffer to be correlated in the nextdetection window.

Within the specification it will be appreciated that the word“comprising” does not exclude other elements or steps, that “a” or “and”does not exclude a plurality, and that a single processor or other unitmay fulfil the functions of several means recited in the claims.

1. A method of compensating for offset in a received signal, the signalbeing modified by a sequence of symbols, each symbol extending overT_(s) signal samples, the method comprising the steps of: (a) dividingthe received signal into frames. (b) dividing each frame into aplurality of N_(b) sub-frames; (c) forming N_(b) sequences of values,the values being derived from the corresponding sub-frame within eachframe; and (d) taking said N_(b) sequences as successive estimates of aframe sequence correctly aligned (to the sequence of symbols.
 2. Amethod as claimed in claim 1, wherein each frame is of predeterminedlength Ts.
 3. A method as claimed in claim 1, wherein there is aninter-frame overlap.
 4. A method as claimed in claim 1, wherein eachsub-frame overlaps an adjacent sub-frame.
 5. A method as claimed inclaim 1, wherein N_(b) lies within the range 2 to
 8. 6. A method asclaimed in claim 1, wherein the sequence of symbols comprises L_(w)symbols, the received signal being divided into L_(F) frames, whereinL_(F) is an integral multiple of L_(w).
 7. A method as claimed in claim1, wherein said sequence of symbols comprises a sequence of valuesconvolved with a window shaping function that has a band limitedfrequency behavior and a smooth temporal behavior.
 8. A method asclaimed in claim 7, wherein said window shaping function has a symmetricor an anti-symmetric temporal behavior.
 9. A method as claimed in claim1, wherein said sequence of symbols comprises a sequence of at least oneof raised cosine functions or bi-phase functions.
 10. A method asclaimed in claim 1, wherein said offset is a time offset.
 11. A methodas claimed in claim 1, the method further comprising processing eachestimate as though it were the correctly aligned frame sequence, so asto determine which estimate is the best estimate.
 12. A method asclaimed in claim 11, wherein the best estimate is assumed to be thefirst estimate that, when processed, exceeds one or more predeterminedconditions; said processing of estimates stopping once the best estimatehas been determined.
 13. A method as claimed in claim 1, the methodfurther comprising the step of correlating each of said estimates with areference corresponding to said sequence of symbols; and taking theestimate with the maximum correlation peak value as the best estimate.14. A method as claimed in claim 11, wherein once a first best estimatehas been determined for a first signal or portion of a signal, themethod is repeated for a further received signal or portion of a signal,the estimates from said further signal being processed in an orderdependent upon said first best estimate.
 15. A computer program arrangedto perform the method as claimed in claim
 1. 16. A record carriercomprising a computer program as claimed in claim
 15. 17. A method ofmaking available for downloading a computer program as claimed in claim15.
 18. An apparatus arranged to compensate for offset in a receivedsignal, the signal being modified by a sequence of symbols, each symbolextending over T_(s) signal samples, the apparatus comprising: a dividerarranged to divide the received signal into frames; a divider arrangedto divide each frame into a plurality of N_(b) sub-frames; and aprocessor arranged to form N_(b) sequences of values, the values beingderived from the corresponding sub-frame within each frame; and to takesaid N_(b) sequences as successive estimates of a frame sequencecorrectly aligned with the sequence of symbols.
 19. An apparatus asclaimed in claim 18, the apparatus further comprising a buffer arrangedto store said N_(b) sequences.
 20. A decoder comprising the apparatus asclaimed in claim 18.